Dynamic High Dimensional Data Mapping for Efficient Similarity Query Processing

نویسندگان

  • Xiangmin Zhou
  • Guoren Wang
  • Xiaofang Zhou
چکیده

For efficient processing of similarity queries, the search space is often reduced by pruning inactive query subspaces which do not contain any query results so only those active query subspaces which may contain query results are examined. Among the active query subspaces, however, not all of them contain query results; an active query subspace that later turns out to contain no query results are called false active query subspaces. The performance of similarity query processing degrades in the presence of false active query subspaces. This problem becomes more serious for high dimensional data with non-uniform distribution. Our experiments show that the number of accesses to false active subspaces increases when the number of dimensions increases. To overcome this problem, we propose, in this paper, a space mapping approach that can reduce such unnecessary data accesses. For a given query space, it can be refined by filtering within its mapped space. A mapping strategy, maxgap, is proposed to improve the efficiency of refinement processing. Based on this refinement method, an index structure called the MS-tree, together with the algorithms for index construction and query processing, are designed and implemented. The MS-tree is compared with a number of existing methods for their performance to support range queries using a real data set.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

MUD: Mapping-based query processing for high-dimensional uncertain data

Many real-world applications require management of uncertain data that are modeled as objects in high-dimensional space with imprecise values. In such applications, data objects are typically associated with probability density functions. A fundamental operation on such uncertain data is the probabilistic-threshold range query (PTRQ), which retrieves the objects appearing in the query region wi...

متن کامل

Advanced indexing and query processing for multidimensional databases

Many new applications, such as multimedia databases, employ the so-called feature transformation which transforms important features or properties of data objects into high-dimensional points. Searching for ’similar ’ or ’nondominated ’ objects based on these features is thus a search of points in this feature space. To support efficient query processing in these high dimensional databases, hig...

متن کامل

Efficient Similarity Search by Summarization in Large Video Database

With the explosion of video data, video processing technologies have advanced quickly and been applied into many fields, such as advertisements, medical etc.. To fast search these video data, an important issue is to effectively organize videos by data compacting and indexing. However, practically, many useful distances for video comparison are suitable to human perception, but non-metric. Ther...

متن کامل

Fast Nearest Neighbor Search in High-Dimensional Space

Similarity search in multimedia databases requires an efficient support of nearest-neighbor search on a large set of high-dimensional points as a basic operation for query processing. As recent theoretical results show, state of the art approaches to nearest-neighbor search are not efficient in higher dimensions. In our new approach, we therefore precompute the result of any nearest-neighbor se...

متن کامل

Fast Nearest-Neighbor Search Algorithms Based on High-Multidimensional Data

Similarity search in multimedia databases requires an efficient support of nearest-neighbor search on a large set of high-dimensional points as a basic operation for query processing. As recent theoretical results show, state of the art approaches to nearest-neighbor search are not efficient in higher dimensions. In our new approach, we therefore pre-compute the result of any nearest-neighbor s...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005